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COPYRIGHT DETECTION AND PROTECTION SYSTEM AND METHOD 
PRIORITY CLAIM 

5 This application claims the benefit of United States Provisional Application Serial 

No. 60/281,881, filed April 5, 2001, and United States Patent Application filed April 3, 
2002, Serial No. not yet available. 

BACKGROUND 

10 1. Field 

The field of the present invention relates to processing digital data. More 
particularly, the field of the present invention relates to identifying, reporting and/or 
protecting digital works from unauthorized transmission and/or copying, such as over 
networks or network segments connected to the Internet. 

1 5 2. Background 

Technological developments such as peer to peer file sharing have revolutionized the 
exchange of information over digital networks such as the Internet. The result has been a 
virtual explosion of copyright violations, as intellectual property is transmitted to individuals 
not licensed to receive and use it. Once copyrighted content is available on the Internet, that 

20 content is exposed to piracy. The unlicensed use of digital copyrighted works online is a 
growing, major concern to the owners of these properties. Current peer-to-peer file sharing 
technology facilitates widespread copyright infringement of various works including songs, 
images, and movies. At the same time, security measures placed into widespread use have 
been defeated. For example, DVD encryption was "hacked" by mid-2000, resulting in the 

25 digital copying and distribution of movies without regard for copyright. As a result of the 
widespread availability of digital works on computer networks, artists and companies 
affiliated with them receive no payment for distribution of copyrighted works on an 
unprecedented scale. 

In response to the growing copyright infringement problem tied to unregulated peer- 
30 to-peer file sharing, copyright owners have sought and obtained legal relief, including 

injunctive relief, against peer-to-peer facilitators such as Napster. Some copyright owners 
have further requested that network operators, including colleges and universities, block 
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5 affiliated with them receive no payment for distribution of copyrighted works on an 
unprecedented scale. 

In response to the growing copyright infringement problem tied to unregulated peer- 
to-peer file sharing, copyright owners have sought and obtained legal relief, including 
injunctive relief, against peer-to-peer facilitators such as Napster. Some copyright owners 

1 0 have further requested that network operators, including colleges and universities, block 
access to peer-to-peer sites to prevent further copyright infringement. At the same time, 
however, there exist substantial non-infringing uses for peer-to-peer file sharing, including 
exchange of creative works that exist in the public domain (such as may exist through 
expiration or abandonment of copyrights, for example) and/or uses that have been expressly 

1 5 permitted. If aggrieved copyright owners prevail in their legal battles against peer-to-peer 
facilitators, then such facilitators may be forced to stop operating irrespective of the content 
they provide. 

The injunction entered against Napster in March 2000 by a federal judge in San 
Francisco, California has ordered the company to remove copyrighted sound recordings 

20 from its system. The recording industry has been given the duty to provide lists containing 
the titles, names of artists, file names, and ownership rights of recordings, and Napster, 
shortly after receiving such identification, is responsible for blocking those materials from 
its system. Yet compliance with this name-based regime has already proven difficult, since 
there exists no file-naming standard and file names can be easily manipulated with known 

25 method presently in use. The inclusion of metadata (data about data, usually constituting 

text embedded in an audio file or stream to represent information such as artist name, album 
name, track name, etc.) in selected audio works may aid in identifying works even if file 
names are changed. However, metadata is only present on newer works, and essentially 
amounts to a more sophisticated extension of file naming technology that is subject to 

30 manipulation and hacking. 
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5 A potential alternative to relying on file naming technology for identifying digital 

works on computer networks is an identification technology known as watermarking. A 
watermark is digital information that is embedded into a file in such a way that it does not 
affect human perception of the content but is easily detectable by machines. One advantage 
offered by watermarking is its easy recognition. However, drawbacks of watermarking 

1 0 technology include its inability to protect the huge amount of previously released audio 

content, and its susceptibility to hacking. Once a watermark is disabled or removed from a 
creative work by a hacker, the resulting product is unprotected. 

A different identification technology known as content-based identification 
("CBID"), relying on the content of creative works, represents yet another alternative to file 

1 5 naming technology. For example, when applied to audio works, CBID analyzes acoustic 
qualities. Various CBID techniques may be used to characterize the qualities of sound 
perceived by a listener. A typical approach is to analyze the spectrum of a sound, such as by 
measuring the loudness of each frequency contained in a multi-frequency sound. 

A more compact CBID technology involves creation of a "fingerprint" from a 

20 creative work that is compact from a data perspective, yet preserves distinguishing 

characteristics that may be used to positively identify a unique audio file. Many simple 
fingerprinting methods have been developed, such as spectral averaging, for example. In 
using these simpler methods, however, a substantial amount of information about the audio 
work is lost. Great care must be taken in applying a particular CBID method for a number 

25 of reasons: not only to ensure only accurate identification, but also to ensure that 
compressed versions of an audio file can be identified, and to avoid known evasion 
techniques such as adding a small segment to the beginning of an audio file. A more 
sophisticated CBID technology would be appropriate to address these concerns. 

One structural application of a sophisticated CBID fingerprinting method for audio 

30 data is found in U.S. Patent No. 5,918,223, issued to Blum et al., the disclosure of which is 
hereby incorporated by reference as if fully set forth herein. The patent provides a system 
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5 and method for performing analysis and comparison of audio data files based upon the 
content of the data files. However, U.S. Patent No. 5,918,223 by itself does not address a 
comprehensive solution to regulating distribution of digital copyrighted works. Moreover, 
U.S. Patent No. 5,918,223 expressly relates to audio information, and does not address the 
similar but distinct problems with regulating online distribution of copyrighted works such 

10 as motion pictures, still images, games, software, and other media. 

Regarding movies, the transformation taking place in the motion picture industry 
from VHS video to digital DVD format has led to the spread of illegally shared copies of 
movies online. While a universal DVD encryption system has been adopted by the motion 
picture industry to block the online trade of illegal DVD content, as mentioned previously, 

1 5 decryption software such as De-Content Scrambling System (DeCSS) is readily available 
online. Moreover, technologies such as DivX allows users to take the decoded movie and 
copy the material onto a CD-ROM for home use through a standard Internet connection. 
The Motion Picture Association of America (MPAA) has moved aggressively to stop the 
illicit trade of movies online. The MPAA has sued online sites and chat rooms that offer 

20 pirated movies, as well as sites offering shared movie files, under the recently adopted 
Digital Millennium Copyright Act. 

With regard to images, photo communities are quickly becoming a favorite new tool 
of online users, as such communities allow users to post, print, and share their photos online 
with other subscribers. The explosive growth in digital camera use has greatly expanded the 

25 popularity of these photo communities. While many sites promote their usefulness in 

sharing family moments and other important events online, some estimates provide that, in 
reality, half of all images posted on these sites are copyright-protected images, and are being 
posted, printed and shared illegally. 

In summary, peer-to-peer file sharing technology offers unprecedented ease in 

30 exchanging information over digital networks. Unfortunately, this technology also permits 
intellectual property rights to be infringed on a widespread scale. Without a comprehensive 
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5 protection system in place to prevent further infringement of intellectual property rights, if 
intellectual property owners prevail in their ongoing legal battles against peer-to-peer 
providers, then the benefits of peer-to-peer file sharing may be lost to everyone. In light of 
all of the considerations discussed above, it would be desirable to provide a reliable and 
secure system for enabling intellectual property owners to distribute digital materials while 
1 0 preventing infringement of intellectual property rights. Preferably, such a system would 

permit intellectual property owners to choose whether distribution of particular works should 
be unrestricted, restricted, or disallowed entirely. 

BRIEF DESCRIPTION OF THE DRAWINGS 
1 5 FIG. 1 is a high-level schematic of a copyright protection system according to a first 

embodiment. 

FIG. 2 is a sample report generated by a copyright protection system, the report 
including several data fields useful to record a transmission transaction. 

FIG. 3 is component architecture schematic for a portion of a copyright protection 
20 system directed to monitoring a multi-session digital signal. 

FIG. 4 is a schematic of a copyright protection system including implementation 
details for content type recognition and identification, in accordance with a second 
embodiment. 

FIG. 5 is a schematic of a copyright protection system according to a third 
25 embodiment. 

FIG. 6 is a process flow diagram for a hierarchical method useful with a copyright 
protection system to assess whether a digital file contains a registered copyrighted work. 

FIG. 7 is a process flow diagram for obtaining and entering information useful to a 
copyright protection system into a database. 
30 FIG. 8 is a schematic of a copyright protection system having a distributed 

architecture for monitoring multiple watched networks. 
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5 FIG. 9 is a process flow diagram for a method of conducting a business enterprise 

through the provision of copyright protection services or a copyright protection system. 

FIG. 10 is a generalized data flow diagram for use with a Stochastic Audio Matching 
Mechanism. 

FIG. 1 1 is a process flow diagram for extracting feature vectors comprising Mel 
1 0 Frequency Cepstral Coefficients. 

FIG. 12a is a first portion of an annotated sequence diagram for extracting features 
from a digital audio work according to a Stochastic Audio Matching Mechanism. 

FIG. 12b is a second portion of the annotated sequence diagram of FIG. 12a. 

FIG. 13a is a graph plotting frequency versus time for a first musical piece performed 
15 by a first artist. 

FIG. 13b is a graph plotting frequency versus time for a second musical piece 
performed by a second artist. 

FIG. 14 is an annotated sequence diagram for generating a model from a digital 
audio work according to a Stochastic Audio Matching Mechanism. 
20 FIG. 15 is an annotated sequence diagram for identifying a digital audio work 

according to a Stochastic Audio Matching Mechanism. 

FIGS. 16-21 illustrate examples of screenshots that may be viewed by an intended 
recipient of unauthorized content in the context of a peer-to-peer file-sharing network. 

25 DETAILED DESCRIPTION 

FIG. 1 generally illustrates a copyright protection system ("CPS") 100 according to a 
first embodiment for monitoring a network segment 102 bearing at least one packet-based 
digital signal in accordance with one aspect of the CPS 100. In other aspects of the CPS 
100, the monitoring point for a data transaction may be at points other than a network 

30 segment of a communication. For example, the monitoring point of the CPS may be a server 
on a community website that monitors the uploads of audio, image, video or other digital 
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5 content. The same community website may alternatively monitor downloads of such data. 
Alternatively, the monitoring point may be a peer or client computer in a peer-to-peer file 
sharing network. In yet another embodiment, the CPS 100 may be integrated or linked with 
a search engine such as Excite® or Infoseek® that monitors search requests and performs 
one or more of the actions of monitoring, recording or blocking based on the nature of the 

1 0 request and the likelihood that it involves transacting copyright protected material. The 

network segment 102 is routed through a network appliance 104 that monitors digital signals 
borne by the segment 102. While FIG. 1 suggests that the network appliance 104 receives 
in-stream communications from the network segment 102, in other embodiments the 
network appliance 104 may alternatively receive mirrored data from a network. For an in- 

1 5 stream configuration such as is suggested by FIG. 1, each network appliance 104 would 
typically communicate with the network segment 102 through a router (not shown) having 
content recognition capability, such as routers commercially available from companies such 
as Cisco Systems or Alteon WebSystems (product information available at 
http://www.cisco.com and http://www.a1teonwebsystems.com . respectively). Preferably, 

20 any digital signals borne by the network segment 102 are periodically sampled to obtain a 
frame of sample data on each occasion. As noted in U.S. Patent 5,918,223, various window 
periods may be used for each frame, but each frame advantageously contains several 
milliseconds of data. A sampled frame is provided to a content recognizer 1 16, preferably 
part of the network appliance 104 that recognizes defined content types. Exemplary content 

25 types include .mp3, .avi, .asf, .ogg, but searching and recognition of practically any 

recognizable file type bearing audio, video, or image data, or digital text, or software, may 
be addressed by the content recognizer 116. 

Upon recognition of the appropriate file type, a sampled frame is then provided to a 
media analysis system 126. The purpose of the media analysis system 126 is to assess the 

30 content of a digital file. While content may be determined according to different methods, 
one desirable method is to use digital content-based fingerprinting if sufficient processing 
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5 resources are available. Preferably, a fingerprint is generated for the frame by the media 
analysis system 126 to aid in identifying the content of the frame. A generated fingerprint 
may then be compared with an archive of fingerprints for registered copyrighted works. 
"Registered copyrighted works" as used herein refers to digital works registered with or by a 
CPS provider or service provider. The existence of a fingerprint archive suggests that, in a 

1 0 preferred embodiment, copyrighted works should be registered with the provider of the CPS 
100, and reference fingerprints should be generated from registered copyrighted works, 
before seeking to detect the transmission of particular works in a network segment 102. If 
the comparison between the fingerprint of the frame and an archived fingerprint yields a 
match, thus signifying the transmission of a registered copyrighted work along the network 

1 5 segment 102, then transmission information is recorded in a content transmission recording 
device 110. 

As illustrated in the sample report provided in FIG. 2, several data fields identifying 
a transmission transaction may be recorded, including, for example, any one or more of the 
following: 

20 a) Source IP Address: the Internet Protocol (IP) address from which the 


recognized content was transmitted; 


b) 


Destination IP Address: the IP address to which the recognized content was 


transmitted; 


c) 


Date Transmitted: the date the recognized media was transmitted; 


25 


d) 


Time Transmitted: the time the recognized media was transmitted; 


Content / Media Name: The name or title of the content whether audio, 


video, still image, or other type; 


f) 


Artist Name: The name of the artist (when appropriate) if the work is a 


copyrighted work already registered with the CPS provider; 


30 


g) 


Album Name: The name of an album (if appropriate) associated with a 


registered copyrighted (e.g., audio) work; 
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5 h) Record Label: The name of an album (if appropriate) associated with a 

registered copyrighted (e.g., audio) work; 
i) Various Meta-Data: Distributor name, producer name, studio name, etc., 

such as may be found attached to a .id3 or .md5 file or tag associated with the 
copyrighted work; 

10 j) Unauthorized Count: The number of unauthorized downloads organized in 

various ways, such as by day, week, month, location, IP address, etc.; 
k) Redirected Count: The number of redirected attempted downloads organized 

in various ways, such as by day, week, month, location, IP address, etc. 
Referring back to FIG. 1, various components of the CPS 100 may be optionally 

1 5 located remotely to one another and connected by a network connection 107. For example, 
certain components such as the network appliance 104 and a content recognizer 116 may be 
located at a first monitored network data center 121, while the remaining components 126, 
146 may be located within a separate administrative network data center 123. FIG. 3 
illustrates a preferred embodiment of a component architecture for a portion 280 of a CPS 

20 100, such as the CPS 100 depicted in FIG. 1, the illustrated portion 280 being useful for 
monitoring a multi-session signal such as may be transmitted along a high bandwidth 
network segment. A high bandwidth network connection 262, preferably anticipated to 
operate at a convenient, commercially available speed, preferably greater than 28 kbps, 
communicates at least one packet-based digital signal to a first statefull session-binding load 

25 balancer 264 that separates the greater network stream into individual TCP or UDP sessions 
and binds those sessions to a specific processing unit (e.g., 268, 269, or 270) in the next 
layer. Connections 265, 266, 267 communicate individual network sessions to content-type 
recognition and identification servers 268, 269, 270, each having at least one processor. 
Each server 268, 269, 270, which preferably includes at least one processor, executes 

30 content-type recognition and content identification services. Within the servers 268, 269, 
270, the raw IP data packets are assembled (or re-assembled), the packets are analyzed for 
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5 presence of media types likely to contain copyrighted content using a content type 

recognition service, and the media content is identified using a content identifier service. 

Though not shown in FIG. 3, the servers 268, 269, 270 preferably have further 
connections (remote or local) to a stored data repository to facilitate content comparison 
with known identifiers for copyrighted content using one or more processors. From the 

1 0 servers 268, 269, 270, packets may be communicated to a second statefull session-binding 
load balancer 274 that reassembles the various separated packets into a single network 
stream 275. Use of a second load balancer 274 to reassemble the separated packets into a 
single network stream 275 is generally only necessary if the portion 280 of the CPS 100 
depicted in FIG. 3 is configured to operate in-stream. In such a case, the high bandwidth 

1 5 network connection 262 would typically be provided to the load balancer 264 by way of a 
router (not shown). Alternatively, if the CPS portion depicted in FIG. 3 receives mirrored 
network data, then the second load balancer 274 would be unnecessary, as there would be no 
need to reassemble separated packets into a single network stream 275 as the data is 
generally already streamed to its intended destination. Although not shown, additional 

20 redundant load balancers 264, 274, servers 268, 269, 270, and/or connections 265, 266, 267, 
271, 272, 273 may be provided to provide failover (backup) capability in case one or more 
primary devices should fail. 

FIG. 4 depicts a preferred embodiment of a detailed implementation of a CPS 100, 
200, omitting (for the sake of simplicity) load balancing devices such as are shown in FIG. 3 

25 to focus on a single session. An incoming network data stream 202 carrying at least one 
packet-based digital signal, preferably separated by session, is provided to a network 
appliance 204. The network appliance 204 may be characterized as a server, and the various 
operational blocks contained within the appliance 204 may be characterized as services, each 
amenable to at least partial performance in software routines. The network appliance 204 

30 includes at least one processor that, in conjunction with memory, operates software code for 
performing various operations on the digital signal. The processor may comprise any type 

10 
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5 of computer, and has processing characteristics dependent upon processing requirements for 
performing the various tasks discussed herein. It may comprise, e.g., a computer, such as a 
workstation including the type manufactured by Sun Microsystems, a main frame computer, 
or a personal computer such as the type manufactured by IBM® or Apple®. 

The term "processor," as used herein, refers to a wide variety of computational 

1 0 devices or means including, for example, using multiple processors that perform different 
processing tasks or have the same tasks distributed between processors. The processor(s) 
may be general purpose CPUs or special purpose processors such as are often conventionally 
used in digital signal processing systems. Further, multiple processors may be implemented 
in a server-client or other network configuration, as a pipeline array of processors, etc. 

1 5 Some or all of the processing is alternatively implemented with hard-wired circuitry such as 
an application-specific integrated circuit (ASIC), a field programmable gate array (FPGA) or 
other logic device. In conjunction with the term "processor," the term "memory" refers to 
any storage medium that is accessible to a processor that meets the memory storage needs 
for a system or system component for performing the functions described herein. 

20 Preferably, the memory buffer is random access memory (RAM) that is directly accessed by 
the processor for ease in manipulating and processing selected portions of data. Preferably, 
the memory store comprises a hard disk or other non-volatile memory device or component. 

The network appliance 204 may be installed either in series with or receiving 
mirrored data from a high bandwidth network segment. Preferably, a packet input receiver 

25 206 accepts the input of a network data stream 202. Associated with the packet input 

receiver 204 is a TCP stream buffering / assembly service 206 that identifies the packet type 
of the input signal, and if the type is TCP, also provides storage buffering as needed and 
assembles the synchronized packet stream. Thereafter, a data extraction service 210 extracts 
the data from synchronized network packets, and then a data buffering service 212 

30 assembles and buffers the data from the incoming packets. 
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5 . Following data assembly and buffering, a content lookup service 214 communicates 

part or all of the data to a content type recognizer service 216. Any portion not 
communicated with the content type recognizer service 216 may be communicated instead to 
a packet output service or transmitter 250. The content type recognizer 216 preferably has 
multiple associated recognizers 218, 220, 222, 224, 225 to recognize file types of interest 

10 including, for example, .mp3, .avi, .asf, .ogg, and other types, respectively. 

Following content type recognition, packets are forwarded to a remote or local 
content identifier service 226 preferably having multiple associated identifiers 228, 230, 232, 
234, and 235 to identify content borne by file types of interest including, for example, .mp3, 
.avi, .asf, .ogg, and other types, respectively. Preferably, the content identifier service 226 is 

1 5 linked to a fingerprint generator service 240. While the fingerprint generator service 240 is 
illustrated as a distinct service from the content identifier 226, the two services optionally 
may advantageously be combined. Within the fingerprint generator 240, a content-based 
fingerprint comprising identifying features may be generated for a frame of data, and then 
forwarded to a content comparator 242. It may not be necessary to utilize a fingerprint 

20 generator 240 for identifying all digital files borne by the network data stream 202, as will 
be discussed hereinafter. Consequently, the content identifier 226 preferably includes a 
separate link to the content comparator 242 that is independent from the fingerprint 
generator 240. 

The content comparator 242 is in communication with a database 244 of stored 
25 content identifiers, preferably by a high-speed network connection. The database 244 
preferably includes database software such as is commercially available from Oracle® 
Corporation operating on one or more high-speed computers with expandable high-speed 
storage capability. The database 244 contains stored content-based identifiers, preferably 
including fingerprints, for copyrighted works registered with a CPS provider such as 
30 ipArchive™. For example, when a copyrighted song is registered with or by a CPS provider, 
the CPS provider would generate entries in the database 244 to assist in identifying the song, 
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5 preferably including at least one fingerprint from the song's content according to a CBID 
method, such as the method disclosed in U.S. Patent No. 5,918,223 issued to Blum et al. 
The CPS provider preferably indexes identifiers including fingerprints to registered works in 
the database 244. Fingerprints may be generated with a content identifier 226 with 
fingerprint generator 240, or with a media analysis system 326 such as provided in FIG. 5. 

1 0 Returning to the content comparator 242, its function is to compare a content 

identifier (such as, for example, a fingerprint generated by the fingerprint generator 240) 
from the incoming data stream 202 and query the database 244 for stored identifiers for 
registered copyrighted works, and then determine whether the incoming data stream 202 
matches with any archived content. If a match is found, then further actions may be 

15 necessary based on business rules associated with the identified content of, the data stream 
202. Information identifying users, destination addresses, and/or passwords authorized to 
receive registered copyrighted content may be stored with the database 244, or, more 
preferably, in a separate database (not shown) communicating with the content comparator 
242. The user/address/password information may be queried by the content comparator 242 

20 to determine whether the data stream 202 is authorized. 

A content transmission reporter 245 is preferably provided to record transmission 
information for copyright enforcement, record keeping, or other purposes. Information such 
as is listed above in connection with FIG. 2 may be stored, and reports such as the 
exemplary report provided in FIG. 2 may be generated. If the data stream 202 is not 

25 authorized, then one or more actions may be taken according to pre-defined business rules. 
Actions that might be taken according to pre-defined business rules, either separately or one 
or more in combination include, for example, recording, reporting and/or blocking a 
transmission, sending a generalized message to the source and/or recipient addresses 
involved with the unauthorized transaction, and sending a message informing a recipient 

30 address of (or redirecting a recipient address to) a commercial site where the desired 
copyrighted work may be purchased. 
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5 To facilitate messaging, a message generator 246 in communication with a packet 

output service or transmitter 250 is preferably provided. Preferably, messages are 
transmitted by way of an instant messaging protocol, such as the instant messenger 
associated with software distributed by www.napster.com . or AOL®. An alternative means 
for transmitting a message to a user is to send a message to a client application on the 
1 0 computer desktop of a user intended to receive the content, the client application including 
some communication capability. The CPS may detect an available client application, and 
then send the message accordingly. For example, the system may detect an Internet 
Explorer® on the user's desktop and send an HTML message to the user via the user's 
Internet Explorer®. 

15 A transmitted message preferably provides instructions, or, more preferably, a link to 

a commercial site, for purchasing a license to the copyrighted work. In one embodiment, the 
recipient of the message is provided the option of contesting the blocking of the content. If 
the recipient chooses to contest the block, a return message is sent to the CPS 100, which 
then may immediately commence transmission of the digital data to the recipient. 

20 Alternatively, the CPS 100 may forward the contested data stream for additional 

identification processing or to an administrator of the CPS for review. In one preferred 
embodiment, the recipient is provided a small sample of both the transmitted content and the 
content to which it matched to enable the recipient to make an evaluation of whether to 
contest the block. For example, if the content is an image, thumbnails of the image and the 

25 matched image may be presented to the recipient side by side on the recipient's browser. 

FIGS. 16-21 illustrate examples of screenshots that may be viewed by an intended 
recipient of unauthorized content in the context of a peer-to-peer file-sharing network. 
FIGS. 16 and 17 depict examples of screenshots as may be viewed by a user using a peer-to- 
peer file sharing client application (FIG. 16 for Napster and FIG. 17 for iMesh). The 

30 screenshots depict a list of songs that the intended recipient may choose to receive. In FIG. 
18, a file sharing client application (e.g., such as for Napster) includes a window that depicts 

14 
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5 the status of a file transfer. When the CPS intercedes in the transfer, the intended recipient 
may view a "Transfer error!" message on the client application. In one embodiment, this 
may be the complete and only message that is communicated b the CPS to the intended 
recipient. The intended recipient may not even be aware that the content has been 
affirmatively blocked, as the message may appear to indicate a communication problem or 

10 fault. Similarly, in FIG. 19, the message received is "Timed out!," which may or may not 
indicate to the content's intended recipient the reason for the failed transmission of the 
content. The specific action taken may depend on business rules associated with the content. 
The business rule may be construed to only report on the transmission and take no action to 
interfere with the transmission. 

15 FIGS. 20 and 21 depicts examples of screenshots of windows that, in one alternative 

embodiment, may be presented to an intended recipient of unauthorized content. The 
windows preferably provide options to the viewer for obtaining the desired content from 
other sources, which are authorized to distribute the desired content, although typically for a 
fee. 

20 Blocking or interrupting an unauthorized transmission may also be performed by 

way of the message generator 246, such as by transmitting a TCP/IP reset. This well-known 
technique is a form of IP spoofing in which the message generator 246 alternately gives the 
appearance that it is the transmitting device and the receiving device associated with a 
transaction, and then forges packets designed to terminate a TCP stream. According to this 

25 blocking method, an unlicensed and unauthorized destination address or recipient may be 
prevented from receiving full transmission of a specific registered copyrighted work. The 
forged packets are independent of any assembled content-bearing packets that may also be 
provided from the packet output service or transmitter 250 to a continued network traffic 
stream 260. 

30 As noted previously, a continued network stream 260 suggests that the network 

appliance 204 is installed in-stream along a network segment. However, the appliance 204 
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5 may also be configured to receive mirrored network data, in which case the need to continue 
transmission of reassembled packets through the packet output service or transmitter 250 to 
a continued network stream 260 may be reduced or obviated. FIG. 5 is a schematic 
representation of an alternative copyright protection system 300. An incoming network 
stream 302 connected to the Internet 301 is routed to a media recognition system 316 

1 0 provided at a network watchpoint. The media recognition system 316 includes an input 
receiver (not shown) for receiving an incoming network stream 302. If the media 
recognition system 316 is placed in-stream to capture all network communications, then an 
output transmitter (not shown) for transmitting the continued network stream 303 en route to 
a watched network 305 is preferably provided. The media recognition system 316 may also 

1 5 be configured to receive a mirrored network data stream according to conventional 

techniques. An in-stream approach requires additional, often expensive routing hardware 
(not shown), and may have a potential drawback of introducing latency into the monitored 
network stream. A potential benefit of an in-stream approach is that it may facilitate 
blocking of an entire transmission before any portion of it is transmitted to the watched 

20 network. The latter approach, implemented using mirrored network data, is preferred if it 
can be implemented at sufficient speed to render it effective at identifying and taking action 
against unauthorized transactions before such transactions are completed. 

Preferably, multiple networks may be monitored by the copyright protection system 
300 with additional media recognition systems 316 (such as embodied in the multiple 

25 network appliances 602, 604, 606, 608 shown in FIG. 8) each monitoring a segment of a 
different network but communicating with common analysis systems and/or a common 
transaction request broker. Each media recognition system 316 advantageously monitors a 
network 305 for traffic in digital files such as, for example, video, audio, image files and 
other digital content. 

30 If a file type of interest is detected by the media recognition system 316, then any 

portion of the signal bearing such a file may be provided to the content analysis system 326 
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5 to perform content identification. There, separate media analysis subsystems 328, 330, 332 
are provided for analyzing images, audio, and video or other media (including software) 
respectively. Image identification may be facilitated by use of the Ereo Exacta-Match 
system, developed by and commercially available from Ereo. Audio identification may be 
performed by application of the methods disclosed in U.S. Patent No. 5,918,223, issued to 
10 Blum et al. or alternatively with the Stochastic Audio Matching Mechanism (SAMM) 
discussed below. Video identification may be facilitated by applying one or both of the 
above-mentioned CBID methods to the audio portion of the video file, if any. Other digital 
works, such as digital text or software, may be identified by any number of methods as are 
known in the art. 

1 5 The media analysis system 326 preferably includes a capability of generating CBID 

fingerprints for digital media, whether such media is obtained from an incoming network 
stream 302 by way of the media recognition system 3 16, or obtained from a raw media 
storage service 340. Preferably, the media analysis system 326 also includes storage 
capability to store content identifiers or fingerprints for registered copyrighted works, such 

20 as may be stored in and forwarded by the raw media storage service 340. The media storage 
service 340 preferably contains a raw media storage archive or database 338 and a raw 
media storage system manager 339 for managing transactions with the archive or database 
338. 

Returning to the media analysis system 326, a further function of the system 326 is to 
25 compare identifiers, preferably including fingerprints, extracted from the network stream 
302 and from registered copyrighted works (such as are stored in the media storage service 
340) to determine whether the network stream 302 contains any registered copyrighted 
content. If the media analysis system 326 finds a match in making this comparison, then it 
may forward information regarding the transaction to a transaction database service 344. 
30 Within the transaction database service 344, a database 345 stores all media received 

by the media analysis system 326. The database 345 is preferably relational to facilitate 
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5 dimensional reporting, and preferably also permits high volume updates. A transaction 

recording and management service 343 is provided to manage queries to the database service 
344 and also to manage data recordation in the database 345. Preferably, a data enrichment 
service 347 in communication with the database service 344 is provided to facilitate either 
automatic or manual addition of information potentially useful to the CPS (such as according 

1 0 to the method provided in FIG. 7). 

A transaction reporting service 348, also is communication with the database service 
344, is preferably provided to define and execute queries for generating reports including, 
for example, the transaction information provided in FIG. 2. Preferably, transaction reports 
may be sold by the CPS provider to owners of copyrighted works to communicate 

1 5 information useful for maximizing opportunities and revenue from the copyrighted works. 
An urgent or scheduled report forwarding service 349 is preferably provided and in 
communication with the transaction reporting service 348 to coordinate generation of urgent 
or scheduled reports. Preferably, reports may be delivered by way of email or another 
active, preferably electronic, delivery system to a client 352. 

20 The transaction reporting service 348 is preferably in connection with a CPS 

transaction request broker service 350 that coordinates and manages various components of 
the CPS 300. The broker service 350 may be used to handle requests from the transaction 
reporting service 348, coordinate and/or manage operation of the media analysis system 326, 
handle requests of the transaction recording service 344, coordinate operations and data 

25 flows associated with the media storage service 340, and finally handle requests by and from 
the client 352. The client 352 preferably includes a web application interface providing 
access to intellectual property owners, reporting subscribers, and/or the community at large. 

Reference has been made in the foregoing discussions to identifying the presence of 
a copyrighted work in a digital signal by way of content-based fingerprints. Such a 

30 methodology (as was described, for example, in connection with FIG. 1) provides but one 
way of performing content identification. While the method described in connection with 
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5 FIG. 1 is highly accurate, it may not be optimal to apply such a method to all digital files 
borne by a network segment due to the generally processor-intensive nature of fingerprint 
generation and comparison. If a copyright protection method is applied in-stream to 
intercept network traffic, then ensuring rapid identification speed is desirable to minimize 
latency. 

1 0 Alternatively, if a copyright protection method is applied to mirrored network traffic, 

then it is important to ensure that content for a particular transaction in a registered 
copyrighted work is identified before the entire transaction is completed. For example, in 
the case of an unauthorized attempt to download a digital movie over a network, preferably 
the content of the movie is identified before the download is completed. Given limited 

15 processing resources, as traffic over a network increases, it may become difficult to generate 
and compare fingerprints for all network transactions with acceptable speed. Consequently, 
resort to a hierarchical method to assess the likely content of a digital signal being 
transmitted over a network may be desirable to ensure acceptable speed with finite 
processing resources. 

20 FIG. 6 illustrates one embodiment of a hierarchical identity assessment method 400 

that may be used in a CPS 100, 200, 300. A guiding principle of this method is to start with 
less processor-intensive steps to assess whether the monitored transmission contains a 
registered copyrighted work, and then to progress to more processor-intensive steps only if 
early steps do not indicate a match. Preferably, the method depicted in FIG. 6 is embedded 

25 in a software routing that may be operated on a computer processor, such as is contained in 
the network appliance 204 illustrated in FIG. 4. The method illustrated in FIG. 6 assumes 
that content type, file name, file size, IP addressing, any metadata, and/or watermarks may 
be discerned or extracted from a digital sample. Preferably, as a precursor to any assessment 
of the digital content that is transmitted, actions such as content blocking or content 

30 transmission reporting may be performed based on other aspects or attributes of the data 
stream. For example, an action may be taken based on the source IP address. Content 
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5 blocking, for example, may be performed based on protocol (e.g., Napster, Gnutella, etc.). 
Alternatively, content transmissions may be acted on based on the Internet Service Provider 
such as AOL®, used by the sender or the intended recipient of the content. 

Utilizing file naming as one assessment criterion, the first step 402 is to compare the 
file name of the sample to file names of registered copyrighted works contained in a 

10 database (such as the database 244 illustrated in FIG. 4). If the file name of the digital 

sample matches a name in the database, then a checking comparison step 404 is preferably 
performed to compare the file size for the digital sample to the expected file size of the 
registered copyrighted work bearing that name in the database. If both the file name and file 
size appear to match, then the likelihood that the digital sample contains a registered 

1 5 copyrighted work considered is high, and a file match may be established according to block 
422. Comparison of file names and file sizes is generally straightforward and does not 
consume substantial processing resources. Alternatively, the determination as to whether a 
match exists may be based only on the filename or the file size. 

If the file name and file size do not both match, then a second assessment criterion 

20 involving a history of unauthorized transactions from a particular source address is 

preferably applied, according to step 406. As discussed previously, information recording 
various aspects of transactions in copyrighted data may be maintained in a database, such as 
the database 244 illustrated in FIG. 4. Representative aspects that may be recorded include 
the source and recipient IP addresses, the type and identity of copyrighted files, and the 

25 number and frequency of transactions or attempted transactions. If a particular source IP 
address generates a history of unauthorized transactions, especially involving files of a 
certain type, then the likelihood is elevated that a data stream emanating from that source IP 
address contains unauthorized copyrighted material. Accordingly, steps 406 and 408 
examine whether a source IP address has a history of unauthorized transactions, and, if so, 

30 whether the file type and/or file size is consistent with past unauthorized transactions. If 
both questions are answered in the affirmative, then a file match may be established 
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5 according to block 422. Querying a database for suspect source IP addresses and file types 
and/or sizes implicated in past unauthorized transactions is generally less processing- 
intensive than generating and comparing content-based fingerprints. 

If examination of the source IP address and file type and/or size do not yield a likely 
match with a registered copyrighted work, then further assessment criteria using any present 

10 metadata or watermarks are preferably applied, according to steps 410-416. If metadata is 
present in the file according to step 410, and the metadata identifies a registered copyrighted 
work according to step 412, then a file match is preferably established according to block 
422. If either of these questions is answered in the negative, then preferably the following 
inquiry is whether the file contains a watermark according to step 414. If a watermark is 

1 5 present, and the watermark identifies a registered copyrighted work according to step 416, 
then a file match may be established according to block 422. Identification by way of 
metadata or a watermark may be performed by reference to archived data, such as may be 
stored in the database 244 illustrated in FIG. 4. Inquiring into the presence of metadata or 
watermark information and querying archived data to compare these identifiers is preferably 

20 performed in advance of fingerprinting to achieve desirable speed characteristics if 
processing resources are limited. 

If none of the foregoing assessment criteria indicate the likely presence of a 
registered copyrighted work, then a content-based fingerprint for a digital sample may be 
generated according to block 418. But even if one or more of the foregoing assessment 

25 criteria indicates a match with a registered copyrighted work, it may be desirable to check at 
least a portion of the matched results with a fingerprint identification method for validation 
purposes. That is, each of the foregoing assessment criteria provides only a probability that 
the unknown content contains a registered copyrighted work. Using fingerprinting 
techniques to check at least a portion of results matched according to other assessment 

30 methods may preferably provide feedback as to the effectiveness of a particular hierarchical 
identity assessment method. 
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5 As noted previously, identification by way of content-based fingerprints is highly 

accurate, but a primary downside in using fingerprinting is its high consumption of valuable 
processing resources. Following fingerprint generation, the fingerprint may be compared to 
an archive of identifiers for registered copyrighted works according to step 420. The 
archived identifiers may be stored in a database, such as the database 244 illustrated in FIG. 

10 4. If fingerprint comparison identifies a registered copyrighted work according to step 420, 
then a file match may be established according to block 422. Alternatively, if fingerprint 
comparison identifies no match according to block 424, then it may be concluded that the 
digital sample does not correspond to a registered copyrighted work. In such an instance, it 
is desirable to store the fingerprint in an archive, such as the database 345 illustrated in FIG. 

15 5, to enable retroactive reporting. That is, it may be desirable to monitor transactions in a 
particular digital work in case an owner of that work later desires to register it with the CPS 
provider and would like to obtain information regarding transactions in that work pre-dating 
registration of the work. Depending on the number, frequency, and/or timing of transactions 
in a particular work, a copyright owner may recognize the benefit of registering the work 

20 and/or choose one or more particular business rules to provide an appropriate and desirable 
level of copyright protection. 

When a copyright owner should decide to register a particular work with the CPS 
provider, one task for the CPS provider is to gather and/or enter potentially useful data 
corresponding to that work into a database or archive, such as the archive 338 illustrated in 

25 FIG. 5. This task may be generally described as data enrichment. Preferably, data 

enrichment is automated to the extent possible, but manual intervention may be desirable, 
such as to augment information available to an automated data enrichment service and/or to 
check and control the quality of automatically entered data. Numerous data fields may be 
useful in operating a CPS or providing copyright protection services in accordance with the 

30 present invention, such as, for example, file name, file size, a content-based fingerprint, 
commerce artist name, label name, album name, producer name, release date, and others. 
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5 FIG. 7 provides an example of a procedure for data enrichment. The first step 500 is 

to obtain the copyrighted work to be registered in digital form. The CPS provider may 
obtain digital files, for example, by way of transmission over a network such as the Internet, 
or by way of a portable digital storage medium such as a CD or DVD. If necessary, the CPS 
provider may receive an analog copy or a hard copy of a copyrighted work, such as a 

1 0 cassette tape or a photograph, and convert it to digital form. The next step 502 to generate a 
fingerprint, preferably for each discrete digital work. If an entire music album were 
provided to the CPS provider, then a separate fingerprint would preferably be generated for 
each song on that album to facilitate identification of individual songs by the CPS. 

A CPS may use Metadata. Inquiry into the presence of owner-supplied metadata 

1 5 may be performed according to step 504. Owner-supplied metadata, which may be found, 
for example, in a format such as an .id3 or .md5 file associated with the digital work, may be 
extracted according to block 506. Types of metadata that might be extracted include, for 
example, artist name, title of the song / movie / work, album name, company / owner name, 
producer name, release date, and similar information. If no owner-supplied metadata is 

20 present, then online metadata archives is preferably queried for the specified copyrighted 
work according to step 508. Examples of online metadata archives that may be queried for 
such information include "FreeDB" and "CDDB." If the online archives include metadata 
for the specified copyrighted work according to block 510, then the metadata is preferably 
extracted according to step 506 for use in the CPS. If no metadata is available for the work 

25 in such a database, then desired information may be added manually according to step 512. 
Following addition of metadata, any art associated with the work may be added to a CPS 
database, such as the archive 338 illustrated in FIG. 5. Such associated art may include, for 
example, an album cover for an audio work, a thumbnail of an image work, or movie art. 

Following addition of metadata information and associated art, preferably a query is 

30 performed to determine which commercial site or sites, if any, offer the particular 

copyrighted work for sale according to step 516. Preferably the commercial site(s) are 
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5 online websites, and more preferably websites affiliated with the CPS provider such as by 
contractual affiliation. Address information, preferably in the form of a URL, for 
commercial websites having the work for sale is then associated with the copyrighted work 
in a CPS database. A final step may be the addition of a "deep" link (such as a URL) or 
product code for purchasing the specified registered copyrighted work from the commercial 

1 0 site according to step 518. The foregoing information may be useful in facilitating 
commercial transactions in registered copyrighted works. 

FIG. 8 illustrates an implementation of a CPS 600 utilizing several network 
appliances 602, 604, 606, 608 distributed along network segments for several watched 
networks 612, 614, 616, 618. Each watched network 612, 614, 616, 618 connects to a 

1 5 distributed electronic network such as the Internet 620, and each network appliance 602, 

604, 606, 608 has access to digital data transmitted between each watched network 612, 614, 
616, 618, and the Internet 620. While a network appliance utilized with a CPS generally 
may operate either in-stream or mirrored along a network segment, the configuration 
illustrated in FIG. 8 illustrates network appliances 602, 604, 606, 608 configured to receive 

20 mirrored data transmitted between watched networks 612, 614, 616, 618 and the Internet 
620. Each network appliance is capable of communicating with a CPS network data center 
630, which preferably includes such devices as a transaction request broker service 632, a 
transaction recording and management service 634, a transaction database 636, a raw media 
storage service 644, and a raw media storage archive 646. The transaction request broker 

25 632 preferably routes and/or manages transactions between various components of the CPS, 
including various network appliances 602, 604, 606, 608. The transaction database 636 
stores information relating to transactions in digital works, with particular emphasis on 
unauthorized transactions in registered copyrighted works. The transaction recording and 
management service 634 provides an interface with the transaction database 636. The raw 

30 media storage archive 646 may be used to store information including digital works, such as 
those supplied by copyright owners or duplicated from traffic communicated between a 
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5 watched network 612, 614, 616, 618 and the Internet 620. The raw media storage archive 
646 may further store fingerprints generated from copyrighted works. The raw media 
storage service 644 provides an interface with the raw media storage archive 646. 

Each network appliance 602, 614, 606, 608 preferably includes a memory for 
receiving and storing content-based identifiers, including fingerprints. Preferably, each 

1 0 network appliance 602, 614, 606, 608 includes a processor to provide content type 

identification and content assessment capabilities. Each network appliance 602, 614, 606, 
608 may be periodically updated with new identifiers from the network data center 630, such 
as identifiers for newly registered copyrighted works. The distributed architecture of a CPS 
according to FIG. 8 facilitates rapid monitoring of high-bandwidth watched networks 612, 

15 614, 616, 618. Each network appliance 602, 604, 606, 608 may communicate with the 
network data center 630 by way of a public network such as the Internet, a virtual private 
network, a dedicated private network, or any combination of such connection types to 
promote system reliability in case one becomes inoperable. Additionally, while FIG. 8 
illustrates only a single network appliance at each watched network 612, 614, 616, 618, 

20 redundant network appliances may be provided at each location to enhance overall system 
reliability. 

Propagation and utilization of a CPS 100, 200, 300, 600 as disclosed herein enables 
novel methods of conducting a profitable business enterprise. FIG. 9 illustrates a business 
method 700 including steps that may be employed according to one or more CPS 

25 embodiments. The business method 700 illustrated in FIG. 9 is intended to present merely 
one example of novel business steps; non-obvious variants omitting certain steps, addition of 
further steps, and applying disclosed steps in a modified sequence are still contemplated to 
remain within the scope of the invention. 

The first step 702 provided in FIG. 9 is providing and/or maintaining a database (or 

30 "registry") of information identifying registered copyrighted works. Herein after a digital 
work which has been added to the database will be referred to as a "registered work" or 
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5 "registered copyrighted work". As new original works are being continuously created and 
owners of existing copyrighted works or operator of the CPS may elect to protect works by 
way of a CPS as disclosed herein, a database of identifiers should be designed to grow over 
time. A data enrichment method, such as that as discussed in conjunction with FIG. 7, is 
preferably applied to build and maintain the database according to this step 702. A revenue- 
10 generating step 704 includes the solicitation of payment in exchange for registering 
copyrighted works and/or providing copyright infringement protection services. This 
payment may be solicited by the provider from, for example, copyright owners individually, 
associations of copyright owners, network providers or operators, or any combination 
thereof. 

1 5 Providing copyright protection services according to the present invention generally 

includes monitoring a network or network segment for transmission of digital signals, as in 
step 706. Identification of files embodying file types of interest transmitted over the 
monitored network or network segment may be performed according to step 708. If a file 
type of interest is found, then one or many of various features may be extracted or generated 

20 from the file to facilitate content identification according to step 710. A comparison step 
712 is advantageously performed to determine whether the extracted or generated features 
match one or more identifiers contained in the database maintained according to step 702. If 
a match is made, then such a match indicates that the file from which the features were 
obtained contains a registered copyrighted work, as noted in step 714. 

25 A typical follow-up step 716 is to check whether transmission or receipt of the 

registered copyrighted work has been authorized by the copyright owner. Preferably, the 
CPS provider maintains a database that identifies authorized senders and/or receivers of 
digital copyrighted works, and that further includes preferences of the copyright owner for 
handling transactions in a copyrighted work. Determining whether a particular address is 

30 authorized to transmit and/or receive a registered copyrighted work may be performed by 
querying the database for such information. Regarding handling preferences, such 
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5 preferences may be used by the CPS provider to apply business rules to transactions or 
attempted transactions in registered copyrighted works. For example, some copyright 
owners such as software developers may distribute copyrighted material according to license 
agreements that expressly forbid digital transmission of the source code. Such owners might 
prefer to block all attempted digital transmission of these materials, and communicate this 

1 0 preference to the CPS provider. 

If upon application of step 716 it is determined that the transmission is not 
authorized, then information identifying the transaction may be recorded (such as in the 
transaction database illustrated in FIG. 8) according to step 718. Recorded information for 
an unauthorized transaction may include identifiers such as included in FIG. 2. Preferably, 

1 5 transaction reports, such as the report illustrated in FIG. 2, may be generated from some or 
all of the recorded information. As information contained in transaction reports may be 
valuable to copyright owners and others, for purposes including but not limited to marketing 
and seeking licensing revenue, such reports may be sold by the CPS provider in a further 
revenue generating step 724. 

20 The ability of generating transaction reports and/or blocking content provides 

additional revenue generation potential by affording businesses and organizations the 
opportunity to install the CPS on their networks or computers. A per-seat license may be 
offered to an organization or business to limit and/or monitor the transmission of content by 
its members and thereby limit the organization's or business' exposure to liability for 

25 unauthorized use of content. Similar to the way virus protection software may be installed 
on individual computers in a local area network of an organization, CPS client software may 
be installed to afford an organization or business copyright infringement protection. 

If transmission of the registered copyrighted work is authorized, then preferably 
lesser information regarding the transaction may be recorded, such as by incrementing a 

30 counter of transactions in the particular registered work, according to step 720. Preferably 
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5 less information is recorded in the case of an authorized, lawful transaction to respect the 
privacy rights of the sender and receiver. 

Following recordation of transaction information for an unauthorized transaction 
according to step 718, business rules may be applied to the transaction according to step 726. 
As mentioned above, the CPS provider preferably solicits preferences of copyright owners 

1 0 for handling unauthorized transactions in registered copyrighted works, and the CPS 

provider maintains a database recording those preferences. The preferences are preferably 
established at the time a work is registered with the CPS, so that business rules to be applied 
to a particular copyrighted work may be defined before detection by the CPS provider of an 
unauthorized transaction in a registered copyrighted work. As noted previously, business 

1 5 rules that might be applied include but are not limited to blocking unauthorized 

transmissions, sending a message to the source address and/or recipient address, referring the 
source address and/or recipient address to a commercial website, and/or recording 
transactions in copyrighted works carried by the monitored signal. A further revenue- 
generating step 728 may follow from the application of business rules, as the CPS provider 

20 may solicit payment for referrals to commercial sites, such as websites, where copyrighted 
works are available for sale and/or commissions for sales resulting from such referrals. 
Preferably, the CPS provider obtains an affiliation, such as by contract, with commercial 
sites to provide for referral and/or commission payments. Accordingly, the exemplary 
business method 700 provided in FIG. 9 provides multiple potential revenue streams to the 

25 CPS provider. 

Returning to the comparison step 712 wherein the features obtained from a sampled 
work were compared to identifiers contained in a CPS database, if no match is found, then it 
may be concluded that the digital sample does not correspond to a registered copyrighted 
work, as provided in step 730. Yet it may still be useful to record information relating to this 
30 work, to facilitate retroactive reporting in case a copyright owner later registers the work 

with the CPS provider and seeks information relating to its digital distribution. A fingerprint 
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5 may be generated from the unregistered work according to step 732. Thereafter, the 

fingerprint may be stored by the CPS provider in a database or archive such as the database 
646 provided in FIG. 8. Preferably, the database (such as database 646 of FIG. 8) is queried 
to compare the newly generated fingerprint to archived fingerprints for other unregistered 
works according to step 734. If a match is found from this query, then a transaction counter 

1 0 may be incremented to reflect the number of transactions in the particular work according to 
step 736. If no match is found, then the fingerprint is preferably added to the database of 
unregistered works. Regarding the capability of providing retroactive transaction reports, 
such information may be useful to the copyright owner in selecting particular preferences or 
business rules to be applied by the CPS provider to protect a copyrighted work following its 

1 5 registration. 

As noted previously, U.S. Patent No. 5,918,223 provides a method for performing 
analysis and comparison of audio data files based upon the content of the data files. An 
alternative method to that disclosed in U.S. Patent No. 5,918,223 for generating statistical 
models of digital audio recordings, which are used for probabilistic identification of 

20 unknown digital audio streams, is referred to herein as a Stochastic Audio Matching 

Mechanism (SAMM). If utilized, SAMM is preferably embodied in a software routine that 
may operated on a device such as a network appliance (e.g., network appliance 104 in FIG. 
1, network appliance 204 in FIG. 4, or network appliances 602-608 illustrated in FIG. 8). 
Discussed below are the mathematical and statistical concepts behind the SAMM system, as 

25 well as a description of one implementation of these concepts. 

SAMM is a process for generating statistical models of digital audio recordings and 
using these models for probabilistic identification of unknown digital audio streams. The 
creation of the models and the identification of unknown audio streams are separate 
functional processes, but they are logically tied together within the overall goal of audio 

30 identification. In practice, the use of SAMM involves the generation of many models for 
each audio item that is to be identified, and the storage of these models in a SAMM 
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5 database. Once the database has been constructed, unknown digital audio streams can be 
positively or negatively (no match found) identified within a known degree of accuracy 
using the SAMM database of audio models. SAMM encompasses two distinct functional 
processes of model generation and audio identification. 

It is important to reiterate that the SAMM process is a statistical tool, and that the 

1 0 identification of unknown audio streams is based on the probability that features the 

unknown audio exhibits matches the features from a known audio sample. A probability 
over a given threshold likely indicates that the unknown audio stream matches (corresponds) 
to the current known sample being compared against, while a probability under the given 
threshold indicates that the unknown audio stream does not match the current model being 

1 5 compared against. Since the system is probabilistic against a threshold, there are no 
absolutes when using this process. 

The model generation process and the audio identification process share a common 
set of data that they operate upon. These data are the features of the digital audio stream. A 
single feature is a collection of the representative characteristics of the audio stream at a 

20 single point in time (currently, about twenty characteristics per feature). Many features are 
extracted for a given audio stream, usually one hundred per second. Once the collection of 
the feature set is completed, SAMM can then generate a model for storage, or use the feature 
set to compare against known models. The details pertaining to feature extraction, model 
creation and audio matching are explained fully in the Process Detail section. 

25 A. SAMM Overview 

FIG. 10 provides a generalized description of the data flow within SAMM. Boxes 
792, 794, 796 represent the major processes of Feature Extraction, Model Generation, and 
Audio Matching, as will be described herein in further detail. The plain text in FIG. 10 
represents the input and outputs for each process. Raw audio input 791 is provided to the 

30 feature extraction process 792, which outputs extracted features 793 to both the model 

generation and audio matching processes 794, 796. The model generation process results in 
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5 creation of a model 795, while the audio matching process results in either positive or 
negative identification 797. 

B . S AMM Inputs and Outputs 
1. Feature Extraction 
a. Data Input 

1 0 Regardless of the desired SAMM functionality (i.e., model generation or audio 

stream identification), at least one feature, and preferably a collection of features, is 
generated from an initial digital audio stream, such as the raw audio data 791 illustrated in 
FIG. 10. This audio stream is therefore the initial data input for the system as a whole. Raw 
digital audio 791 coming into the system is preferably first decoded and down-sampled to a 

15 pulse code modulation (PCM) stream, such as at a frequency of 16 kHz. Typically, .mp3 
and CD quality audio streams are encoded at 44.1 kHz. This decompression and conversion 
may be performed by a readily available third party utility such as the Linux utility mpgl23. 
Once decompressed and converted, the PCM stream is assembled into a data array, which is 
the primary input into the Feature Extraction process 792. 

20 b. Parametric Input 

The statistical parameters used in feature extraction should be determined before the 
extraction process 792 occurs. The primary parameters used in the mathematical and 
statistical formulas used for feature extraction (discussed in further detail, infra) are 
summarized below with corresponding sample values for illustration purposes: 

25 • Sampling rate of the incoming PCM data (e.g., 16kHz). 

• Window length (which is a function of the sample rate). 

• Skip rate (which is a function of the sample rate). 

• Pre-emphasize constant (e.g., 0.97). 

• Filter bank count (e.g., 20) - this is the number of datum in a feature. 

30 • Filter bank channels (e.g., Filter bank count - 1) - number of computed Mel-Frequency 
Cepstral Coefficient (MFCC). 
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5 • Center frequency (e.g., Filter bank count + 2). 

These parameters are preferably set or calculated software, 
c. Feature Output 
The output of the Feature Extraction process 792 is a collection of feature vectors, 
the number of which is determined by the parametric input to the process. Each vector 
1 0 preferably consists of Filter bank count number of floats and this vector statistically 

represents the digital audio stream at a particular moment in time. The collection of features 
is treated within the software as an array of arrays (two-dimensional array) of floats, and this 
serves as the data input to the 2. Model Generation process 794 and 3. Model Matching 
process 796. 
15 2. Model Generation 

a. Data Input 

The input to the Model Generation process 794 is an array of an array of floats 
(collection of feature vectors 793) representing the audio to be modeled. This is the output 
of the 1. Feature Extraction process 792. 
20 b. Parametric Input 

The statistical parameters used in the extraction of features should be determined 
before execution of the Feature Extraction process 792. The primary parameters chosen for 
the mathematical and statistic formulas used in model generation are summarized below 
with corresponding sample values for illustration purposes: 
25 • Vector length (e.g., Filter bank count). 

• Mixture count (e.g., 8). 

• Max iterations (e.g., 15). 

• Max frames (e.g., 3000 - this corresponds to 100 frames per second for 30 seconds of 
audio). 

30 • Variance threshold (e.g., 0.001). 

These parameters are preferably set or calculated within software. 
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5 c. Model Output 

A generated model 795 is preferably a binary file containing statistical information 
about the raw audio 791 from which the original feature set was generated. The output 
model 795 is preferably stored in a "SAMM" database (such as, for example, the database 
338 illustrated in FIG. 5 or the database 646 illustrated in FIG. 8) for use in a model 
1 0 matching process 796. 

3. Model Matching 

a. Data Input 

The input to the model matching process 796 is preferably an array of an array of 
floats (collection of feature vectors 793) representing the audio to be identified. This is the 
15 output of the 1. Feature Extraction process 792. 

b. Model Matching Result 

Output from the model matching process 796 is preferably a textual representation of 
the identification result. If the feature set from a digital audio stream did not have a match 
against any model in a SAMM database, a "NO_MATCH" string may be returned. If the 

20 statistical attributes of the digital audio stream compare favorably against a model in a 

SAMM database, then the string "MATCH - <ID>" may be returned, where "<ID>" may be 
replaced by a unique SAMM database identification number of the model that the input 
matched with a degree of certainty. 
C. Process Detail 

25 1. Feature Extraction 

a. Concept Overview 
The primary goal of the feature extraction process 792 is to develop a representation 
of the acoustic signal suitable for classification. A good set of features for this problem 
should take into account the properties of the human ear while maintaining a high rate of 

30 data compression. Because the ear is most sensitive to changes in spectral magnitude and 
least sensitive to signal phase difference, the features used here preferably capture the 
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5 spectral shape of the signal over small "snap-shots" in time. In particular, the audio may be 
analyzed over small blocks of samples during which the signal is assumed to be short-time 
stationary (20 to 25 ms is reasonable for speech and most audio). Overlapping windowed 
segments of audio may be extracted at a rate of, for example, 100 snap-shots per second to 
produce a vectored feature stream for classification. Different extraction rates may be used. 
1 0 Each frame of audio consisting of approximately 25 ms of PCM samples (e.g., 400 samples 
@ 16kHz) may be converted into a multi-dimensional, preferably 20-dimensional, vector 
that encodes the spectral shape and relative-energy of the signal. The feature vector used in 
the audio classifier is described in further detail below. 


15 Observation vectors are computed periodically, preferably every 10 ms, from short-time 
windowed segments of audio data, preferably 25 ms in length. For each frame of audio, a 
multi-dimensional, preferably 20-dimensional, observation vector is extracted consisting of 
multiple Mel Frequency Cepstral Coefficients (MFCC), preferably 19 in number, and one 
normalized log-energy term. A block diagram of a MFCC feature extraction process 800 is 

20 illustrated in FIG. 11. In a first step 802, raw digital audio is decoded and down-sampled to 
a PCM stream, such as at a 16 kHz frequency. In a second step 804, short-time windowed 
segments are extracted from the down-sampled stream. According to a third step 806, a 
sampled frame is windowed. The feature extraction begins by pre-emphasizing the audio to 
remove glottal and lip radiation effects according to a fourth step 808. The pre-emphasis 

25 operation is implemented as a first order Finite Impulse Response (FIR) filter given by 


where z represents a one sample delay. Note that in the time-domain, the resulting signal is 
given by y(n) = s(n)- 0.97 s(n-l) where y(n) represents the pre-emphasized signal and s(n) 
30 represents the input signal. Next, the magnitude spectrum of the waveform is computed 
using the Discrete Fourier Transform (DFT) according to step 810. The linear frequency 


b. 


Mathemati c s/Stati stic s 


(Eqn. 1) 


H(z)=\ -0. 97 z~ l 
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5 axis is then warped onto the Mel scale according to step 812 in order to take into account the 
relationship between frequency and "perceived" pitch. The mapping between the linear 
frequency scale and Mel scale is given by 

(Eqn.l) /rae/ =25951og, 0 [u^J 

The warped magnitude spectrum is next passed through a bank of triangular-shaped 
1 0 filters that uniformly partition the Mel frequency scale into P regions according to step 814. 
Note that uniformity on the Mel frequency scale takes into account nonlinear sensitivity of 
the ear across frequency. For 16 kHz sampled audio, 20 filters (P=20) are used. The filter 
outputs generate a discrete set of P log-energy terms, (e[j],j=l..P). Let w/k] represent the 
weight of the jth filter to the kth discrete frequency of the sampled signal s(n) and let \S mel [k]\ 
15 represent the DFT magnitude spectrum of s(n) warped onto the Mel frequency scale. 

Assuming an N point DFT of the signal, the log-energy within the jth filter bank is given by, 

(Eqn. 2) log 2 ^g>v y [A:]- 1 5^ [t] |J for j = 1, 2, ... P 

Thereafter, the 19 MFCCs (c,[i], i=1..19) are computed for each excised frame of audio by 
20 decorrelating the filter outputs according to step 816 using the discrete cosine transform 
(DCT), 

(Eqn. 3) c,m = ^^e\j}cJ^(j-0.S)^ 

Finally removing the long-term mean from the features normalizes the MFCC parameters. 
25 This process, known as Cepstral Mean Normalization, helps to reduce the influence of 
channel mismatch on the excised features (e.g.., in song classification such mismatch can 
occur when different codecs are used to encode the signal or if frequency equalization is 
applied during the encoding process). The final 19 MFCCs are given by 
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(Eqn. 4) 


c,[fl = ££c,[i] 


The 19 dimensional vector is augmented with a normalized log-energy component, which is 
calculated for each frame of data. Finally, the log-energy term is calculated by first taking 
the log of the sum of the squared data samples. Let s,(n) represent the nth sample from the 
fth excised frame of audio. Assuming Ns samples per frame of audio, an initial frame-based 
energy term is computed as follows, 


The energy outputs are normalized to range between -5.0 and +1.0 and are 
augmented as the 20 th feature vector element. 

c. Feature Extraction Implementation 

The implementation discussions herein are intended is to provide a high-level 
mapping of the concepts and the mathematics for providing SAMM functionality sufficient 
to enable one skilled in the art to practice the inventive method. In furtherance of this goal, 
FIGS. 12a - 12b provide annotated sequence diagrams to additionally detail the program and 
data flow of the individual processes. The following sections are intended to discuss the 
inner workings of SAMM in the context of the concepts and mathematics at the object level. 

The implementation of SAMM is preferably performed in an object-oriented fashion, 
such as in the C++ programming language, thus all objects described in this section and in 
the following sections are C++ objects. 

A higher level calling entity 902 within the Media Analysis System initiates the 
process of feature extraction. A utility object within SAMM called the AudioAnalyzer 904, 
which is instantiated and managed by the calling entity, performs the actual extraction. The 
calling entity 902 is also responsible for managing the collection of raw data from which 
features are extracted, and managing the AudioAnalyzer' s use of this data. The calling 


(Eqn. 5) 
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5 entity executes various member functions on the AudioAnalyzer 904 with the ultimate goal 
being the extraction of features from raw audio data and the storage of this collection of 
features as a data member within the AudioAnalyzer object 904. Once populated with the 
features, the AudioAnalyzer object 904 is used as the storage and wrapper of the features as 
they are used in the process of model generation of audio identification. 
10 2. Media Model Generation 

a. Concept Overview 
It is assumed that perceptual difference between audio music and other audio media 
are primarily manifested by the characteristics of the signal's spectrum. This is illustrated in 
FIGS. 13a-13b for two segments of audio from music pieces. Here, frequency is plotted 
1 5 along the y-axis while time is plotted along the x-axis. In FIG. 13a we see distinct 

sequences of spectral patterns emerge as the song progresses from piano key strokes through 
a percussion hit through finally a sequence of synthesizer key strokes. In FIG. 13b we see 
other patterns manifested when a singer vocalizes a word in the song. In FIG. 13a a piano 
keystroke leads into a percussion followed by synthesizer notes. In FIG. 13b an artist 
20 sustains vocalization while singing. It is assumed that the sequences of spectral patterns 
across the time-sequence of the audio represent the signature or "footprint" of the song. 
Modeling the spectral characteristics of each audio clip allows one to distinguish between 
artists and music pieces. 

Ideally, one would prefer to model the trajectory of the spectral events in order to 
25 capture the evolution of the audio over time. However, it is pointed out that the explicit 
modeling of temporal events leads to a classification algorithm which is susceptible to 
performance degradations when the signal is shifted in time. 

The proposed algorithm considered here assumes that the sequence of features 
extracted from the song is statistically independent. Under this assumption, the likelihood of 
30 observing a feature vector x t at time t is not dependent on the feature vector jc / _ 1 extracted at 
time t-1 or any other time for that matter. In other words, the likelihood of observing 
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5 sequence of T feature vectors, X = fa,x 2 ,...,x T } given a model X for an audio segment can 
be expressed as, 

(Eqn.6) p(X\k)=f[p{x,\X) 

Eqn. 6 states that the likelihood of observing a sequence of feature vectors given a 
particular model for a music clip is based on the product of the individual likelihood of 
1 0 observing each feature vector excised from the clip. At 100 feature vectors per second of 

audio, complete computation of Eqn. 6 for 30 seconds of a song would require the product of 
T=3000 individual likelihoods. Note that since repeated multiplication of many numbers 
smaller than 1 leads to numerical underflow problems on most PC hardware. Therefore, the 
likelihood in Eqn. 6 is generally expressed in terms of its log-likelihood, 

15 

(Eqn. 7) \ogp{X \k)=J j \ogp{x l \X) 

The basic concept behind the audio modeling scheme is that each song under 
consideration can be modeled by characterizing the statistical distribution of the feature 

20 vectors excised from an example of the song. In doing so, the audio modeling scheme 
becomes less sensitive to slight alterations in the features. Such alterations can be 
experienced due to differences in audio codecs, time-shifts in the signal, sampling rate, etc. 
Unlike audio "fingerprinting" schemes that try to find an exact match of the audio to a 
known model, the statistical approach returns the likelihood or probability that the observed 

25 set of features were generated by a model, X. Therefore given a set of S modeled songs, 

fy ] ,A 2 ,...A, s }, and an unknown audio clip with excised feature sequence, X -^c u x 2 ,...,x T }, 
the goal of the search is to find the model X s with the maximum likelihood of generating the 
unknown feature sequence. The song associated with this model is assumed to best match 
the unknown. In other words, 
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5 (Eqn. 8) A J = arg max {log p{X\X s )) 

Of course, Eqn. 8 assumes that the feature sequence X = §c u x 2 ,...,x r ] was 
generated from at least one of the known S songs in the database. A case in which the test 
sequence is outside of a known database will be considered, infra. 
b. Mathematics/Statistics 

10 It is assumed that the feature vector sequence X - {x, ,x 2 , . . x r } is statistically 

independent and generated from a random process consisting of a linear combination of 
Gaussian basis functions. Models of this type are known as Gaussian Mixture Models 
(GMMs). GMMs have been used in the past for problems such as Speaker Identification and 
Language Identification. A Gaussian Mixture Model characterizes the likelihood of 

1 5 observing a feature vector x as a weighted combination of Gaussians: 

i. The Gaussian Mixture Model 
(Eqn. 9) pOM^IX 

20 

where b m (x) is the multivariate Gaussian density. For a D-dimensional feature vector, 
b m (x) can be expressed as, 

(Eqn. 10) b m (x)= ^ exp|-^-/I.)S^-/Z,)j 

25 

Here, p, m and S ra represents the vector mean and covariance of the mth Gaussian 
density respectively. Further, the weights for the Gaussian functions follow the sum-to-one 
property, 
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5 


(Eqn. 11) 



For data sparsity and speed issues, the covariance matrix in the model is assumed to 
be diagonal, i.e., all elements off the diagonal are zero-valued. Therefore, our model 
consists of M mixture weights, mean vectors, and covariance matrices. Typically numbers 
10 of mixtures needed to accurately model a song range between M=10 and M=32. 

ii. Parameter Estimation 
Estimation of the model parameters is based on the Expectation-Maximization (EM) 
algorithm A. Dempster, N. Laird, and D. Rubin, "Maximum Likelihood from Incomplete 

1 5 Data Via the EM Algorithm," J. Royal Stat. Soc. , Vol. 39, pp. 1-38, 1977, and L. Baum et 
al., "A Maximization Technique Occurring in the Statistical Analysis of Probabilistic 
Functions of Markov Chains," Ann. Math. Stat., Vol. 41, pp. 164-171, 1970, both references 
of which are incorporated by reference as though fully set forth herein. A practical 
application of the update equations can be found in D. Reynolds, R. Rose, "Robust Text 

20 Independent Speaker Identification Using Gaussian Mixture Speaker Models," IEEE 

Transactions on Speech and Audio Processing, Vol. 3, No. 1, pp. 72-83,Jan. 1995, which is 
incorporated by reference as though fully set forth herein. The parameter estimation 
algorithm is iterative. At each iteration, a new set of model parameters are determined 
which increase the total likelihood of the training patterns against the current model. In 

25 general between 6-10 iterations of the parameter update equations are required before model 
convergence. 


(I) Initialization: The M mean vectors of the model are initialize 


30 


to randomly chosen data vectors in the training set of T vectors, 

■X = &,x 2 ,...,x T }. The M covariance vectors are initialized to have 


unit variance for each feature element and mixture weights are 


initialized to have equal weighting (i.e., w m - MM ). 
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(2) Iterative Update: Assuming diagonal covariance matrices, the 
observation probability, b m (x, ), can be expressed by, 
(Eqn. 123) 

)= X nr— ex P - ^ ^[,1 

^rjn^w 1 7 m J 


(J) Likelihood: Let 1 3c ( , A) represent the a posteriori 
probability of the mth modeled Gaussian given feature vector x, , 

(Eqn. 14) p{m\x„X)= 

The update equations for the mixture weights, mean vectors, and 
diagonal-covariance matrices can then be expressed as, 

Mixture weight update 

1 T 

(Eqn. 135) w m = -^p(m\x„X) 
Mean vector update 

T 

(Eqn. 146) ft m = 

5>(m|x,,A) 

Diagonal-Covariance update 


41 


WO 02/082271 


PCT/US02/10615 


T 


5 


(Eqn. 157) 0 2 m = 


jp(m|x,,A) 


(4) Check Likelihood: The total likelihood ((Eqn. 7) of the data 


iteration i should be greater than that at iteration i-1. Note that over- 


iterating can reduce the performance of the classifier. 


10 


iii. 


Practical Considerations 


There are several practical ranges for the parameters that can be observed during 
model estimation in order to determine whether or not the convergence criteria for the 
iterative EM algorithm are satisfied. While absolute criterion that total likelihood of the data 
against model should increase at each iteration, the following parameter ranges should be 
1 5 maintained, 


iv. Notes on Algorithmic Efficiency for Likelihood Calculations 
Computation of the likelihood of an individual feature vector against a known model 
20 is generally expressed in the log-domain to avoid numerical underflow problems, 


0 < w m < 1 

a I >0 
0<6 ra (3c,)<l 


(Eqn. 18) 



As mentioned, we can expand Eqn. 18 by inserting Eqn. 13 for b m (x t ): 


25 


(Eqn. 19) 



Assuming diagonal covariance matrices, Eqn. 19 becomes, 
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5 


(Eqn. 20) \ogp(x | A)= log. 



2h ° 2 M 


Evaluation of Eqn. 20 requires M exp operations, 3D+M multiplies, and one log 
operation. In general, we observe that one Gaussian tends to dominate the likelihood 
computation. Therefore, if it is assumed that only one Gaussian contributes significantly and 
10 the remaining M-l Gaussians have zero-probability, it can be shown that the expression in 
Eqn. 20 can be approximate as follows, 


Here C m is a mixture-density dependent constant that can be pre-computed at run-time, 

15 


Further computational savings for Eqn. 21 can be obtained using partial distance elimination 
(PDE) and feature component reordering (FCR) as described in B. Pellom, R. Sarikaya, J. 

20 Hansen, "Fast Likelihood Computation Techniques in Nearest-Neighbor based search for 
Continuous Speech Recognition," submitted to IEEE Signal Processing Letters. The basic 
idea of partial distance elimination is to compute Eqn. 21 for the first mixture Gaussian 
(m=l) in its entirety and only partially compute Eqn. 21 for the remaining mixtures. Note 
here that since Eqn. 21 seeks to determine the mixture component which maximizes the 

25 expression on the left-hand-side (LHS) of the equation, the summation over the D vector 
elements can be prematurely stopped as soon as the partial accumulation falls below that of 
the best-scoring mixture. The end result is that we compute the entire equation for at least 
one of the Gaussian basis functions but only partially compute the expression for some or all 
remaining mixtures. The PDE algorithm is guaranteed to give the same output value as the 


(Eqn. 21) 



(Eqn. 22) C m ^\og(w m )-^.\og(27t)-^log^ 2 m \j]) 
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5 complete computation of Eqn. 21 (i.e., if Eqn. 21 were to be computed as shown). Alone, 
PDE reduces the computation by 10% based on empirical simulations. 

The effectiveness of the PDE algorithm can be enhanced when combined with 
feature component reordering (FCR). FCR seeks to re-order the sequence of features 
computed in the summation term in Eqn. 21 such that the partial summation more quickly 
1 0 approximates the true value of likelihood computed over all the elements. The re-ordering 
of the feature sequence (i.e., j /(_/')) is determined empirically from observed data. FCR 
combined with PDE reduces the computation of Eqn. 21 by 30% based on empirical 
simulations. Note that PDE and FCR both assume that the "nearest-neighbor" 
approximation for log-likelihood calculations is used. 
15 c. Model Generation Implementation 

FIG. 14 is an annotated sequence diagram describing the process of model generation 
within SAMM. A calling entity 902 initiates model creation via the use of the 
AudioModeler object 906. The inputs required for the AudioModeler object 906 are an 
AudioAnalyzer object 902, which contains the set of features to be modeled, and a reference 
20 to the model to be created. This reference is passed to the AudioModeler object 906, and the 
model is created in-situ. 

3. Media Identification 

a. Concept Overview 

The goal of the media identification algorithm is decide whether or not the audio 
25 material under test matches one of the S songs modeled by the system. If the system decides 
that the audio is from one of the modeled songs in the database, the identifier must provide a 
classification of which song the material is from. 

b. Mathematics/Statistics 

The media identification task can be cast as a binary hypothesis problem. Under 
30 hypothesis HI we conclude that the audio under consideration was emitted from one of the 
known models X a (s=l..S). Under hypothesis HO, we conclude that the audio was not 
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5 emitted from any of the known modeled songs. The optimal processor for the binary 
hypothesis problem is the likelihood-ratio test, 

In other words, we compare the ratio of probabilities that the feature sequence X was emitted 
1 0 from known model A f against the probability that the feature sequence was emitted from an 
unknown source A- (i.e., a song not in the database). The resulting ratio is compared to a 
decision threshold © . If the ratio falls below the threshold, we conclude hypothesis HO, 
otherwise we conclude hypothesis HI. In the log-domain, the log-likelihood ratio processor 
becomes, 

15 (Eqn.24) \ogp(X\A s ) - \ogp{X\X,) S lo g 6 

obtained from song obtained from 
in finite database song outside of 
modeled database 

It is clear that the first term on the LHS of Eqn. 24 can be expressed as a linear combination 
of Gaussian basis functions, estimated from the song under consideration for the test. 
However, the model A ? that characterizes the HO hypothesis is not so clearly defined. 
20 Currently, our solution is to model A F using the top N nearest models to X excluding X s . 
Eqn. 24 becomes, 

(Eqn. 25) \o%p{X \ A„,) -l^log^X | A. f= „ ) | logG 

obtained from song 1 — ^— — K — — ' 

in finite database now obtained from 
songs inside of 
modeled database 

Here, A v=1 is used to denote the model with the highest-likelihood for the unknown 

test observation sequence X and A ( ,„ (n=2..N+l) is used to denote the next N top scoring 

25 models for same test observation sequence. The goal here is that the model for condition HO 

should model the case of "any" song present while the first term in Eqn. 25 should model the 

case of a "particular" song we are interested in. In the next section, we consider the search 
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5 mechanism. Note that the second normalization term in Eqn. 25 has been used for problems 
such as Speaker Voice Verification. This technique is sometimes referred to as "cohort 
normalization". 

i. Search Algorithm 

Given an unknown sequence of T excised feature vectors, X = {x, , x 2 , . . . , x T }, and 
10 known modeled set of 5 song titles with associated GMMs {A ] ,X 2 ,...A S }, the search is 
defined as follows: 

(1) Initialization: Initialize the accumulated log-likelihoods C[s] 
of all song models to zero. All songs are assumed to be active and 
potential candidates for X. 

15 (2) Update: For each active song model, pickup one feature vector 

from the stream at time instant t and update the log-likelihood of each 
song model, 

C[s] = C[s] + \ogp(x,\A s ) 

20 ( 3) Prune: After a sufficient block of features have been 

examined, prune a fraction of the remaining models that have the 
lowest log-likelihood score C[s]. If fewer than jV+1 models remain, 
do not prune and models (since they are required to compute Eqn. 25). 
(4) Repeat: Repeat steps 2 and 3 until N+l models remain or all 

25 feature vectors in the stream have been consumed. 

ii. Verification Algorithm 

Utilizing the N+l models with the largest log-likelihood, we hypothesize that the 
model with the absolute highest likelihood is the song representing the unknown feature 
sequence. We test this hypothesis using the likelihood ratio test (Eqn. 25). If the computed 
30 log-likelihood ratio falls below the threshold, we assume that the unknown is not a modeled 
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5 song in our database. Otherwise, the best matching model (i.e., the one with the highest 
likelihood) is assumed to be the song that represents the unknown (our match). 

c. Media Identification Implementation 
FIG. 15 is an annotated sequence diagram describing the process of media 
10 identification within SAMM. The implementation of the 3. Media Identification process is 
similar to 1. Feature and 2. Media Model Generation. A calling entity 902 initiates the 
identification process via the use of the AudioSearch object 908. The inputs required for the 
AudioSearch object 902 are an AudioAnalyzer object 904, which contains the set of features 
to be searched, and a reference to the in-memory database used to store all the known 
1 5 models against which SAMM is comparing the unknown audio input. 

Accordingly, novel systems and methods for protecting digital works have been 
disclosed. While embodiments and applications of the invention have been shown and 
described, it would be apparent to those skilled in the art that many more modifications are 
possible without departing from the inventive concepts herein. The invention, therefore, is 
20 not to be restricted except in the spirit of the appended claims. 


25 


30 


47 


WO 02/082271 


PCT/US02/10615 


What is claimed is: 

1 . A method of identifying transmissions of digital works comprising: 

maintaining a registry of information permitting identification of digital works; 
monitoring a network for transmission of at least one packet-based digital signal; 
extracting at least one feature from the at least one digital signal; 
comparing the extracted at least one feature with registry information to assess 

whether the at least one digital signal includes a transmission of at least one 

portion of a registered digital work; and 
taking action based on the result of the comparison 

2. The method of claim 1, further comprising the step of initiating an action based on 
the result of the comparison step, wherein the action is in accordance with at least 
one pre-defined business rule(s). 

3. The method of claim 1, wherein said action comprises the step of recording 
information identifying transmissions of digital works. 

4. The method of claim 1, further comprising the step of extracting the recipient address 
from the digital transmission and determining whether an address is authorized to 
receive a registered work included in the at least one digital signal. 

5. The method of claim 4, further comprising the step of initiating an action based on 
the result of authorization determination step, wherein the action is in accordance 
with pre-defined business rules. 

6. The method of claim 1, further comprising the step of extracting the source address 
from the digital transmission and determining whether an address is authorized to 
transmit a registered work included in the at least one digital signal. 

7. The method of claim 6, further comprising the step of initiating an action based on 
the result of authorization identification step, wherein the action is in accordance 
with pre-defined business rules. 
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5 8. The method of claim 2, wherein the at least one business rules is selected from the 
set consisting of: interrupting transmission of the at least one digital signal; 
providing a message to the recipient address; providing a message to the source 
address; referring the recipient address to a commercial site where the registered 
digital work may be purchased; forwarding an advertisement to the recipient address; 
1 0 recording information about the transmission; and reporting information about the 

transmission. 

9. The method of claim 2, wherein the at least one business rules is selected from the 
set consisting of: inserting different digital content into the transaction; or replacing 
the digital content with other digital content. 
15 10. The method of claim 1, wherein the at least one digital signal comprises one or more 
of digital audio information, digital video information, digital text, software, or 
digital image information. 
1 1 . The method of claim 1 , further comprising the step of preparing reports including 
information on transmissions of digital works. 
20 12. The method of claim 1 , further comprising the step of network traffic analysis. 
13. The method of claim 1, wherein the at least one feature extracted from the digital 

signal is selected from the group consisting of: file name, file size, file type, source 
address, recipient address, metadata identifiers, a watermark, a file hash, protocol 
type, text content, and enough data to generate a content-based fingerprint. 
25 14. The method of claim 1 , further comprising the step of generating a content-based 
fingerprint from the at least one digital signal. 
15. The method of claim 1, wherein the registry includes content-based fingerprints 

which represent digital works, and the comparison step includes the comparison of 
fingerprint(s) generated from the at least one digital signal with fingerprints in the 
30 registry. 
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The method of claim 1, wherein the assessment of whether the comparison of at least 
one extracted feature with registry information is performed according to a hierarchy 
of comparison criteria. 

The method of claim 1, wherein the registry is composed of information pertaining to 
copyrighted works. 

The method of claim 1, wherein the action taken is to implement a strategy to cache 
certain digital content in the network . 

The method of claim 18, wherein the action taken is based on geographic or network 

topologic information associated with the destination of the transmission. 

The method of claim 1, wherein the at least one portion comprises information that 

represents the entire digital work. 

A digital works identification system comprising: 

a first memory for storing information permitting identification of digital works; 

a digital input receiver for receiving at least one packet-based digital signal 
transmitted over a network; 

a second memory for storing at least one portion of the at least one packet-based 
digital signal; and 

at least one processor for obtaining at least one feature from the at least one 
digital signal and comparing the at least one feature with the stored 
identifying information. 

The method of claim 21, wherein the system is intended to protect copyrighted 

works. 

The method of claim 21, wherein the system is intended to identify new digital 
works. 

The system of claim 21, wherein the at least one feature includes data sufficient to 
generate a content-based fingerprint from the content of the at least one digital signal. 
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The system of claim 21, wherein the at least one feature includes feature vectors 
generated from the content of the at least one digital signal. 

The system of claim 21, wherein the at least one digital signal comprises one or more 
of digital audio information, digital video information, digital representation of text, 
software, and digital image information. 

The system of claim 21, wherein the at least one processor identifies digital works 
carried by the at least one packet-based digital signal. 

The system of claim 21, wherein the at least one packet-based digital signal is 
characterized by at least one of: a source address; or a destination address, 
the system of claim 21, further comprising a third memory for recording the 
identified transmissions. 

The system of claim 29, wherein the third memory further includes a transaction 


The system of claim 29, further comprising a report generator for generating reports 
on transmissions carried by the at least one packet-based digital signal. 
The system of claim 28, further comprising a transaction request broker for 
identifying whether a destination address is authorized to receive the content of the at 
least one digital signal. 

The system of claim 32, further comprising means for disabling transmission of at 
least a portion of the at least one packet-based digital signal if the destination address 
is not authorized to receive the content of the at least one digital signal. 
The system of claim 28, further comprising a transaction request broker for 
identifying whether a source address is authorized to transmit the content of the at 
least one digital signal. 

The system of claim 34, further comprising means for disabling transmission of at 
least a portion of the at least one packet-based digital signal if the source address is 
not authorized to transmit the content of the at least one digital signal. 
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5 36. The system of claim 28, further comprising a message generator for sending a 

message to the source address or destination address upon detection of an identified 
transmission. 

37. The system of claim 36, wherein the message directs a destination address to a 

commercial site where authorization to obtain the desired content may be purchased. 
1 0 38. The system of claim 28, wherein other content is inserted into the transmission. 

39. The system of claim 28, wherein other content is substituted into the transmission. 

40. The system of claim 21, wherein the at least one processor includes: 

a content type recognizer for identifying the file type of a packet-based digital 
signal. 

15 41 . The system of claim 21, wherein the at least one processor includes: 

a content identifier for identifying the content of a packet-based digital signal. 

42. A digital works identification system comprising: 

a first database containing features of digital works; 

a digital input receiver for receiving at least one digital signal characterized by a 
20 source address and a recipient address; 

a feature extractor for extracting features from the digital signal; and 
a feature comparator for comparing features extracted from the digital signal 
against features contained in the database. 

43. The system of claim 42, wherein the database further contains authorization 

25 indicators indicating whether an intended recipient is authorized to receive particular 

content. 

44. The system of claim 42, wherein the database further contains authorization 
indicators indicating whether an intended source is authorized to transmit a particular 
digital work. 

30 45. A method for conducting a business enterprise, the method comprising the steps of: 
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maintaining a first database for storing features permitting the identification of 

registered digital works; 
monitoring a network for transmission of at least one digital signal characterized 

by a source and recipient address; 
extracting at least one feature from the at least one digital signal; 
comparing the extracted at least one feature with features from the database to 

determine whether the digital signal includes a registered digital work; and 
taking action based on business rules. 

46. The method of claim 45 wherein said taking action comprises recording information 
identifying transmissions of registered digital works. 

47. The method of claim 45, further comprising the step of determining whether a 
recipient address is authorized to receive transmission of a registered digital work. 

48. The method of claim 45, further comprising the step of initiating an action based on 
the result of the identification step, wherein the action is in accordance with pre- 
defined business rules. 

49. The method of claim 45, further comprising the step of generating at least one 
feature of a registered digitial work, wherein the at least one feature includes feature 
vectors. 

50. The method of claim 45, wherein the at least one feature includes source address. 

5 1 . The method of claim 45, wherein the at least one feature includes destination 
address. 

52. The method of claim 45, wherein the at least one digital signal comprises one or 
more of audio information, video information, image information, digital 
representation of text, or software. 

53. The method of claim 45, further comprising the step of soliciting owners of digital 
works to register digital works with the business enterprise for inclusion in the first 
database. 
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5 54. The method of claim 45, further comprising the step of obtaining value from owners 
of digital works in exchange for registration of digital works in the first database. 
55. The method of claim 45, further comprising the step of obtaining value from network 
providers or network operators in exchange for providing a copyright protection 
system or copyright protection services. 
1 0 56. The method of claim 47, further comprising the step of referring an intended 

recipient not authorized to receive a registered digital work to a commercial site 
where authorization to receive the registered digital work may be purchased. 
57. The method of claim 56, further comprising the step of soliciting value from 

commercial sites to which intended recipients are referred by the business enterprise. 
15 58. The method of claim 47, further comprising the step of precluding an unauthorized 
recipient address from receiving uninterrupted transmission of a registered digital 
work. 

59. The method of claim 45, further comprising the step of determining whether a source 
address is authorized to transmit a registered digital work. 
20 60. The method of claim 59, further comprising the step of precluding an unauthorized 
source address from sending uninterrupted transmission of a registered digital work. 

61 . The method of claim 47, further comprising the step of transmitting a message to an 
unauthorized recipient address. 

62. The method of claim 59, further comprising the step of transmitting a message to an 
25 unauthorized source address. 

63. The method of claim 45, further comprising the step of inserting digital content into 
the transmission. 

64. The method of claim 45, further comprising the step of substituting digital content 
into the transmission. 
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65. The method of claim 61, wherein the message includes instructions by which an 
unauthorized recipient address may purchase authorization to receive a registered 
digital work. 

66. The method of claim 45, wherein the identifiers of registered digital works are 
generated from the content of the digital works. 

67. The method of claim 45, further comprising the step of maintaining a second 
database recording transmissions of digital works over the network. 

68. The method of claim 67 wherein the second database includes recipient addresses. 

69. The method of claim 67, wherein the second database includes source addresses. 

70. The method of claim 67, further comprising the step of preparing reports including 
transaction information. 

7 1 . The method of claim 67, further comprising the step of selling the reports. 

72. The method of claim 45, wherein the network includes a connection to the Internet. 

73. The method of claim 45, further comprising the step of maintaining a second 
database recording information identifying transmissions in digital works over the 
network. 

74. A method for altering the behavior of a device based on the transmissions of digital 
works comprising: 

maintaining a registry of information permitting identification of digital works; 
monitoring a network for transmission of at least one packet-based digital signal; 
extracting at least one feature from the at least one digital signal; 
comparing the extracted at least one feature with registry information to assess 

whether the at least one digital signal includes a transmission of a registered 

digital work; 

accumulating statistical measurements of the features of the transmitted digital 
works; and 
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5 altering the functional behavior of the device based on the comparison of the 

transmitted work 

75. The method of claim 74, wherein the device is a network component. 

76. The method of claim 74, wherein the functional behavior that is altered is allocating 
bandwidth to different categories of network use. 

10 77. The method of claim 74, wherein a database of frequently identified digital works is 
maintained such that these works could be inserted into the digital stream in place of 
the original source transmission. 

78. A registry of digital works comprising: 

an identifier to link a digital work to associated information about the work; 
15 at least one feature of the digital work permitting the digital work to be identified; 

business rules that indicate how the digital work may be transmitted over a digital 
stream. 

79. The method of claim 78, wherein the at least one feature extracted from the digital 
signal is selected from the group consisting of: file name, file size, file type, source 

20 address, recipient address, metadata identifiers, a watermark, a file hash, protocol 

type, text content, and a content-based fingerprint. 

80. The method of claim 78, wherein the business rules indicate whether a particular 
address or range of addresses is authorized to transmit said digital work. 

81 . The method of claim 78, wherein the business rules indicate whether a particular 
25 address or range of addresses is authorized to receive said digital work. 
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observeData (rawData) 


preEmphasize (rawData) 

Performs the high-pass filter as dictated by 

Eqn 4.1 . Results in pre-emphasized data. 


buildSpeechFramea (preEmphasizedData) 
Constructs a vector of frames (frameVector). 


SpeechFrame 


Instantiates 
with 

preEmpData 










computeWindowedFrame () 
Constructs the window frame of 
25ms of audio. Corresponds to 
Block 3 of Figure 4.1. 

computeLogEnergy ( ) 
Calculates the log energy of the 
current frame. This function 
uses Eqn 4.6. 


Add current frame (SpeechFrame) to frameVector. 

} 
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computeMFCC (preEmphasizedData) 
Management function for calculating the 
MFCC matrix for all frames in frameVector. 


:ulatePilterBank() 

Management function for 
construction of the filter bank 
(block 7, Figure 4J). 

f ourierTransf orm () 
Performs the fast Fourier 
Transform on the frame data 
corresponding to Block 5 in 
Figure 4.1 

f ourierHelper () 
Warps to Mel scale via 
Eqn4.2. 


Performs the Discrete 
Cosine Transform on 
MFCC data via Eqn 4.4. 


for(all frames) { 

• construct feature vector of 19 MFCCs 

• insert 1* element using the log energy as 
computed in computeLogEnergy O • 
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trainModel [model, features) 

Management function for model generation. A reference to the model 
to be generated is an argument, along with the features that will 
ultimately construct the model. 


Initialize model. 


GMMTrainO 

Initializes and estimates parameters for GMMs. 


initGMMTrainO 

Initializes the mean, covariance, and weighting vector 
(according to Eqn. 4.12}. 


GMMTrainEM ( ) 

Iterative parameter updating as described Section 
4.2.2.2 

for( every GMM) { 


gausaProb (GMM) 

Updates observation probability 

using Eqn. 4.13. 


} 

• Updates probability using Eqn. 4.14. 

• Updates Mean vector using Eqn. 4.16. 

• Updates covariance using Eqn. 4.17. 

• Updates weighting vector using Eqn. 4.15. 


Impart state on model. 
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Instantiates 
with address 
of model 
database. 


findMatch (features) 

Management function for determining the existence of a likely match 
of the features to a model in the database. Returns the 
AudioModelScore of the matched model, or null if no match found. 


Initialize the score of every model in database. This corresponds to 
Step 1 in Section 4.3.2.1 


for(featurcs) { 


Select random feature vector 
from list of features. 


for( every model in database) { 


AudioModelScore : : updateScore ( feature) 
Updates the overall score of the current model with the 
results of comparing this feature. Corresponds to Step 2 
in Section 4.3.2.1. 


Add model to the pruner. This corresponds to Step 
3 in Section 4.3.2.1. 


is described in Step 4 of Section 


Determine existence of match. This corresponds to Eqn. 4.25. If 
match is found, return the AudioModelScore of the likely model 
against which the features matched. If no match found, return a null 
AudioModelScore pointer. 
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